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Let {Xk,k £ Z} be an autoregressive process of order q. Vari- 
ous estimators for the order q and the parameters 0, = (9\, . . . ,6q)^ 
are known; the order is usually determined with Akaike's criterion 
or related modifications, whereas Yule-Walker, Burger or maximum 
likelihood estimators are used for the parameters ©9. In this paper, 
we establish simultaneous confidence bands for the Yule-Walker es- 
timators 9i\ more precisely, it is shown that the limiting distribution 
of maxi<i<(i^ |6i — 9i\ is the Gumbel-type distribution e^*^ , where 
q € {0, . . . , d„} and d„ — 0{n^), iS > 0. This allows to modify some 
of the currently used criteria (AIC, BIG, HQC, SIC), but also yields 
a new class of consistent estimators for the order q. These estima- 
tors seem to have some potential, since they outperform most of the 
previously mentioned criteria in a small simulation study. In par- 
ticular, if some of the parameters {Si}i<i<d„ are zero or close to 
zero, a significant improvement can be observed. As a byproduct, it 
is shown that BIC, HQC and SIC are consistent for q G {0, . . . ,dn} 
where d„ = 0{n'^). 

1. Introduction. Let {^fc}/fcgz be a qth-order autoregressive process AR(g) 
with coefficient vector &q £ M*?. A considerable literature in the past years 
dealt with various aspects and problems on AR(g)-processes; see, for in- 
stance, [4, 17, 23, 29] and the references therein. More recently, people have 
moved on to more complicated models such as ARCH [14, 19], GARCH [13] 
and related models, which again have been extended in many different direc- 
tions. However, in many applications, AR(g)-processes still form the back- 
bone and are often used as first approximations for further analysis; in par- 
ticular, many estimation and fitting procedures can be based on preliminary 
AR(Og) approximations. This includes, for instance, ARMA, ARCH and 
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GARCH models [11, 22, 24]. Thus, AR(g) processes have moved from the 
spotlight to the backstage area, yet their significance remains unchallenged. 

When fitting an AR(g) model, two important questions arise: how to 
choose the order q, and having done so, which estimators are to be used. 
Naturally, these two problems can hardly be separated and are often dealt 
with simultaneously, or at least so in preliminary estimates. An extensive 
literature has evolved around these two issues. Pioneering contributions in 
this direction are due to Akaike [1, 2], Mallows [30, 31], Walker [44] and 
Yule [49]; for more details we refer to [4, 15, 17, 23, 29] and the references 
there. In order to be able to describe some of the basic results, we recall 
that an AR((7) process {X^j^g^ is defined through the recurrence relation 

(1.1) Xk = 9iXk_i-\ \-9qXk_q + ek, 

where it is often assumed that {£k}k£Z is a mean-zero i.i.d. sequence. Let 
4>h = E(XfcXfc_|_/i), k,h£7j, be the covariance function. A natural estimate 
for (ph is the sample covariance (pn,h = ^'I27=h+i-^i-^i-h- Depending on 
the magnitude of h, a different normalization, such as (n — h)~^ , is some- 
times more convenient. Denote with 0g = [9i,. . . , 9q)^ the parameter vec- 
tor and put $g = {(pi, . . . and let Tq = {4'\i~j\)i<i,j<q be the {q x g)- 
dimensional covariance matrix. Then it follows from (1.1) that ^q&q = ^q', 
hence a natural idea is to replace the corresponding quantities by estimators 
= i.'Pn,!, ■ ■ ■ ,(pn,q)'^ , ^q = (.(t'n,\i~j\)i<i,j<q-, and thus define the estimator 
eq = i9i,...,9q)^ via 

(1.2) T~^^q = eq and a^{q) = - &^%q, 

where a'^ =E(eQ). These estimators are commonly referred to as the Yule- 
Walker estimators, and they have some remarkable properties. For example, 
if {Xfcjfcgx is causal, then the fitted model 

Xk = 9lXk^i + h 9pXk_q + Ek 

is still causal; see, for instance, [17] and [34]. Another interesting feature 
is that even though the Yule-Walker estimators are obtained via moment 
matching methods, their variance is asymptotically equivalent with those 
obtained via a maximum likelihood approach. More precisely, for m>q\i 
holds that 

(1.3) V^{@m - @m) A AA(0, a^V-}), 

where 0^ = (^i, . . . , ^g, 0, . . . , 0)"^; see, for instance, [17]. These asymptotic 
results form the basis for earlier estimation methods of the order q [37, 43, 
45], which focused on a fixed, finite number of possible orders and con- 
sist of multiple-testing-procedures, which in practice leads to the difficulty 
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of having a required level. On the other hand, as it was pointed out by 
Shwarz [39], a direct likelihood approach fails, since it invariably chooses 
the highest possible dimension. Akaike [1] and Mallows [30, 31], developed 
a different approach, which is based on a "generalized likelihood function." 
Shibata [41] investigated the asymptotic distribution and showed that the 
estimator based on (1.4) is not consistent. This issue was successfully dealt 
with by Akaike [2] (BIG), Hannan and Quinn [25] (HQC), Parzen [36], 
Rissanen [38] and Shwarz [39] (SIC), who introduced consistent modifica- 
tions (Parzen's CAT-criterion is conceptually different). For more recent 
advances and generalizations, see, for instance, Barron et al. [6], Foster and 
George [20], Shao [40] and the detailed review on model selection given 
by Leeb and Potscher [28]. A particularly interesting direction addresses 
AR(oo) approximations; recent contributions are due to Bickel and Yel [12] 
and Ing and Wei [26, 27]. Here and now, we will content ourselves with 
briefly discussing Akaike's approach and closely related criteria. Akaike's 
generalized likelihood function leads to the expression 

(1.4) AlC(m) = nloga^(m) + 2m, 

where n is the sample size and a'^{m) is as in (1.2). An estimator for the 
order q is then obtained by minimizing AlC(m), m € {0, 1, . . . , i^}, for some 
predefined ^ <q< K . Consistent modifications are obtained by inserting an 
increasing sequence C„, and AlC(m) then becomes 

(1.5) AlC(m) = nlog52(m) + 2C„m, m G {0, 1, . . . , K}. 

Most modifications result in C„ = O(logn), even though the arguments are 
sometimes quite different. A notable exception is the idea of Hannan and 
Quinn [25], who successfully employed the LIL to obtain C„ = ©(loglogn). 

The aim of this paper is to introduce a different approach, based on 
the quantity maxi<j<rf„|0j — where dn is an increasing function in n. 
It is shown, for instance, that, appropriately normalized, this expression 
converges weakly to a Gumbel-type distribution. On one hand, this allows 
to construct simultaneous confidence bands for the Yule-Walker estima- 
tors 0d„, but also permits us to construct a variety of different, consistent 
estimators for the order q of an autoregressive process. The asymptotic dis- 
tribution of such a particular estimator is also derived. As a byproduct, it 
is shown that known consistent criteria such as BIC, SIC and HQC are also 
consistent if the parameter space is increasing; that is, consistency even holds 
if g G {0, . . . , dn}, where dn = 0{n^). This partially gives answers to ques- 
tions raised by Hannan and Quinn [25] and Shibata [41] , and extends results 
given by An et al. [3]. In addition, the general method seems to be very useful 
for model fitting for subset autoregressive processes (see, e.g., [33]), which 
is highlighted in Remark 2.11 and Section 3. A more thorough treatment of 
this issue is postponed to a subsequent paper. 
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2. Main results. We will frequently use the following notation. For a vec- 
tor X = {xi, . . . ,Xd)'^ , we put ||x||oo = niaxi<j<rf|xj|, and for a matrix A = 
{0'i,j){i<i<r,i<j<s}^ r, s G N we denote with 

s 

(2.1) II A||oo = max{Ax|x G M'^, ||x||oo = 1} = max > |ajj| 

the usual induced matrix norm. In addition, we will use the abbreviation 
||-||p = (E(|-|P))Vp,p<oo.The main results involve an array of AR((7) pro- 
cesses; more precisely, we consider the family of AR(d„) processes {xj[^}k£Z, 
1 < r < dn, where dn = 0{n^) (more details are given later). Since we are 
always only dealing with a single member of this array, the index (r) is 
dropped for convenience, and we just consider an AR((i„) process {Xk}kez, 
keeping in mind that the parameters {6i}i<i<dn may depend on n. This 
implies that satisfies the recurrence relation 

(2.2) Xk = eiXk-i + --- + ed^Xk^d„+£k, kez, 

where {efcjfcez defines the usual innovations. Note that dn does not need to 
reflect the actual order q of the AR((i.„) process, as we do not require that 
{(^i}i<i<dn are all different from zero. All of the results are derived under 
the following assumption regarding the AR{dn) process {Xk}kez- 

Assumption 2.1. {Xk}kez admits a causal representation X^ = 
I]i^o«« ^ ^k-i, such that: 

• sup„ ^'(jTT,) = 0{m~^), "i? > 0, where ^(m) := Yl'S^ml^il' 

• {£k}k& is a mean-zero i.i.d. sequence of random variables, such that 
||efc||p < oo for some p > 4, ||eA:||2 = o"^ > 0, A; G Z, 

• sup„^~i|0i| <oo, |0„| = o((logn)-i). 

In accordance with the previously established notation, we introduce the 
inverse and estimated inverse matrix 

In addition, we will use the convention that 9q = 9q = —1. We can now 
formulate our main result. 

Theorem 2.2. Let {Xk}k&i he an AR((f„) process satisfying Assump- 
tion 2.1. Suppose that (i„ — )■ oo as n increases, with dn = 0{n^) such that 

(2.4) 0<(^<min{l/2,??p/2}, {I - 2'd)5 < {p - A) /p. 

If we have in addition that inf/i|7^ ^| > 0, then for z G M 

P(a-My/^ max | (^* .a2(d„))-V2(^_ _ ^ ^ < ^^ ^ exp(-e-^), 
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where an = {2log dn) and bn = i2logdny - (Slog dn) (log log + 
47r-4). 

Remark 2.3. Condition mifi\^l^\ > may be explicitly expressed in 
terms of {9i}i<i<dn [^^e (4.1)], and is quite general. In fact, it is only needed 
to control or exclude possible pathological cases. 

Remark 2.4. Note that if we have \ai\ = 0{i~^/'^), then -d > 1/2. Hence 
condition p > 4 implies that we may choose 6 arbitrarily close to 1/2, which 
essentially results in dn = o{^/n). 

The above remark indicates that we may obtain simple bounds for dn^ 
provided that we can control asymptotically. If the cardinality of the set 
{1 < i < dn\6i ^ 0} tends to infinity as n increases, then establishing general 
and simple conditions on the relation between {^i}i<i<rf„ and ^'(m) seems 
to be very difficult. One may, however, obtain the following corollary. 

Corollary 2.5. Suppose that {ek}kez is a mean-zero i.i.d. sequence 
of random variables, such that \\sk\\p < oo for some p > 4, He/clli = o"^ > 0, 
k GTj, and that one of the following conditions holds: 

(i) sup„Efeil^d<l, 

(ii) 9i = 0, q <i <dn for some fixed q £N which does not depend on n. 

Then the conditions of Theorem 2.2 are satisfied and we can choose any 
dn = 0{n^) with 5 < 1/2. 

Remark 2.6. The rate of convergence to an extreme- value type distri- 
bution as given in Theorem 2.2 can be rather slow; see, for instance, [5, 35]. 
Hence, in view of (1.3) (and Theorem 6.1), it may be more appropriate to 
use the approximation 

P( max \M%,a')-'/'{e, - d,)\ < x) ^ P(||^,„ |U < x) 

\l<i<n ' / 

in practice, where = {^n,iT ■ ■ ,Cn,dn)'^ ^ d^-dimensional mean-zero 
Gaussian random vector with the same covariance structure. Correspond- 
ing quantiles can be obtained, for instance, via a Monte Carlo technique. 
However, if dn is sufficiently large, one has that 

PmdJoo<x)^P{\\VdJoo<x), 

where rj^^ = (t/„^i, . . . , rjn^d^)'^ is a sequence of i.i.d. mean-zero Gaussian ran- 
dom variables with unit variance. A bound for the error can be given by using 
the techniques developed by Berman [9] and Deo [18]; see also the proof of 
Theorem 2.8. 
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The above results allow us to construct the simultaneous confidence bands 

A^iK) = {ed„GM^"| 

(2.5) 

a-^(V^ max li^Ur^^^O, - 9i)\ - 6„) < VdHdn)Vi-A, 

\ l<i<dn ' / ) 

where Vi-a denotes the 1 — a quantile of the Gumbel-type distribution given 
above. In the literature [4, 17, 23] one often finds the confidence ellipsoids 

A^2(m) = {0^GM"^| 
(2-6) ^ ^ ^ 

where Xi-ai^) denotes the 1 — a quantile of the chi-squared distribution 
with m degrees of freedom. Note that in general Aii{dn) ^ ■M2{dn) and vice 
versa. The confidence region M2{dn) can be viewed as a global measure, 
where the impact of single elements {\6i — ^i|}i<i<d„ is negligible, which in 
turn leads to suboptimal confidence regions for single elements. In contrast, 
n) can be viewed as a local measure where single elements have a large 
impact, which clearly leads to significantly tighter bounds for the single 
elements {\9i — ^i|}i<i<d„. This is a very important issue for so-called subset 
autoregressive models; see Remark 2.11. 

Theorem 2.2 not only can be used to construct simultaneous confidence 
bands for the Yule- Walker estimators 0d„) but also provides a test for 
the degree of an AR(g)-process. To be more precise, for an AR(g)-process 
{^fcjfcez satisfying the assumptions of Theorem 2.2, we formulate the null 
hypothesis T-Lq : g < q^, and the alternative Ha - Q > Qo- Since for any fixed 
k>l 

p{a-'[V^inaxJi%,a\dn))-'^%\ - bn) < z) ^ 1 

as n increases, it follows immediately from Theorem 2.2 that under Tio we 
have 

P(a-^(V^ max \{^Ua\dn))-^/%\ - bn) < z) ^exp(-e-^) 

\ V qo+k<i<dn / / 

for any fixed integer k > 1, since we are assuming that 9i = for i > qq. 
Conversely, it is not hard to verify (see the proof of Theorem 2.8 for details) 
that the quantity 

an^iVn max |(7i"iS'^(f^n))~^^^^i| - &n ) 

V qo+l<i<d„ / 

explodes under the alternative Ha '-I > Qo- This can be used to establish 
a lower bound for the order q or to test if the order was chosen sufficiently 
large. This is particularly useful if q is large compared to the sample size and 
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the magnitude of @q, in which case the AIC and related criteria sometimes 
heavily fail to get near the true order. More details on this subject and 
examples are given in Section 3. Generally speaking, such situations are 
often encountered in subset autoregressive models; see Remark 2.11. 

The above conclusions lead to the following family of estimators qi]} for q. 
Let Zn be a monotone sequence that tends to infinity as n increases. Then 
we define the estimator 

(2.7) qil^=mm\qen\a-'(V^ mjix \{%,a\dn)r'/%\ - b^) < zA . 

Using the above ideas, it is not hard to show that the estimators q'}}^ are 
consistent if z„ does not grow too fast. In fact, under some more conditions 
imposed on the sequence z^, we can even derive the asymptotic distribution 
of the estimators. 

Assumption 2.7. In addition to Assumption 2.1, suppose that: 

• E^=il^*l < oo, \Bn\ = 0((logn)-2-'?), > 0, 

• E(exp(A|efc|)) < oo, for some A > and all k gZ, 

• \ai\ = 0{i~^), 13 > 3/2. 

Theorem 2.8. Let {Xk}kez be an AR{q) -process such that Assump- 
tion 2.7 is valid. Assume in addition that inf/j |7/J /j| > and Zn = 0{logn). 

Then if Zn ^ oo, the estimator q^} in (2.7) is consistent. Moreover, the 
following expansion is valid: 

P{q^} =k + q) = ^ + + j 

for ken, k = 0{n^), 6<l/7. 

Remark 2.9. The stronger conditions of Assumption 2.7 are necessary 
to control the rate of convergence in Theorem 2.2, which in turn allows for 
the explicit expansion given above. This, however, also leads to the more 
restrictive bound q -\- k = 0{dn) = 0{n^), 6 < 1/7; see also Remark 6.2. If 
we are only interested in establishing consistency, then we may drop these 
more restrictive assumptions; see in particular Theorem 2.12 below. 

Remark 2.10. Theorem 2.8 yields that in some sense the estimators qi]^ 
possess a discrete uniform asymptotic distribution, which leads to the sur- 
prising conclusion 

P(g<i) = l + g)«P($<l) = 1000 + g). 

This fact can be explained by the maximum function in the definition of q^ , 
more precisely, due to the weak dependence of the Yule-Walker estima- 



8 



M. JIRAK 



tors &d„ ■ The maximum function essentially does not care at which index i 
the boundary Zn is exceeded, and this results in the uniform distribution. 
It turns out (see Section 3) that a modified version of the estimator qi]^ is 
a very efficient preliminary estimator that establishes a decent lower bound. 

An asymptotic uniform-type distribution clearly is not a desirable prop- 
erty for an estimator. However, similarly to Akaike's method, we can intro- 
duce a penalty function and construct different yet also consistent estima- 
tors for the order q. To this end, for x G M put = max(0,x) and let 
Tn,i = 0,"^ {^/n\{^* ,■ X a'^{dn))~^^ 9i\ — bn)- Then we introduce a new estima- 
tor (f^J as 



More generally, let J- = {fd)d£N be a collection of continuous functions such 



• /d is a map from M'^+^ to M, 

• fd{0,...,0,q,d)<fd{0,...,0,q + l,d) for alld,gGN, 

• if a„,d„— )'00 as n increases, then . . ,an, . . . — )■ oo as n in- 
creases, regardless of the values of the other coordinates. 

Define 



(2-8) qii' =argmin/d„(0, ...,0,(T„,g+i -z„,)+,...,(T„^rf„ - z„)+, g, d„,). 



Then arguing as in the proof of Theorem 2.8 it can be shown that this 
constitutes a consistent estimator for the true value q. For example, the 
following estimator 



satisfies the conditions above and is consistent. 

Remark 2.11. Note that instead of defining a specific order q, one can 
also consider a special lag configuration, for example, 0^ = (6*1,02,0, . . . ,0, 
^10) ^iii • • • ) ^g)"^- Such configurations are commonly referred to as subset 
autoregressive models; see, for instance, [16, 32, 33, 42, 46] and the references 
therein. The AlC(m) and especially related consistent criteria have problems 
dealing with such subset autoregressive models, which can be seen as follows. 
By Hannan [23], Chapter VI, we have for m G N 




that: 




AIC(m)n 



-1 



m 



(2.9) 



m 



log?n,o + 5^1og(l-^(m)) + 2n-iC„m. 



i=l 
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This shows that in case of subset autoregressive models, the penalty func- 
tion 2n~^C„?n is too severe and should be replaced, at least in theory, by 
=il{6»,7^o}> since this is impossible in practice. Of course the 
same problem arises if some of the {^i}i<i<g are close to zero. A maximum 

based estimator like qi]^ gets less effected, which is empirically confirmed in 
Section 3. 

An often encountered theoretical assumption for estimators related to 
AlC(m) is that the parameter space for q is finite; that is, it is usually 
assumed in advance that q S {0, . . . ,K}, where K is "chosen sufficiently 
large," but finite. In [25], K is allowed to increase with the sample size with 
unknown rate, which was specified later by An et al. [3]. Note, however, that 
for the estimators defined above we allow K = Kn = dn- Before extending 
this result, we give precise definitions of BIC, HQC, MIC (= miscellaneous 
information criterion) and SIC, as the literature does not seem to be very 
clear on this subject, in particular in the case of the BIC and SIC. In the 
sequel, the following definitions are used: 

BlC(m) = SIC(?7i) = loga^(?7i) + mn~^ logn, 

(2.10) MlC(m) =loga^(m) +m/277,~^logn, 

IIQC(?Ti) = log(j^(m) + n~^2cmloglogn, c> 1. 

This means that we use the same definitions for BIC and SIC (asymptot- 
ically), which is the case mostly encountered in the literature. The MIC 
differs from the BIC by the choice of the constant 1/2 that naturally leads 
to a less parsimonious criterion, which performs quite well in the examples 
given in Section 3. Using some of the results of Section 4 and 6, one may 
prove the following. 

Theorem 2.12. Assume that the conditions of Theorem 2.2 hold, and 
additionally assume that inf/j |1 — > 0. Let Cn he a positive sequence such 
that: 

• lim„ C„(21oglogn)-i > 1, Cn = o(n), 

• \0gdn<0{Cn)- 

Then the estimators for the order q defined as 

q^ = argmin(logCT^(m) -|- n~^Cn'm) 

0<m,<dn 

are consistent. 

Remark 2.13. Note that condition inf/i|l — 9f^\> essentially is already 
provided by the causality condition in Assumption 2.1. 
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Theorem 2.12 thus implies the bounds dn G O{o{n}/'^),o{n^^'^),o{logn)} 
for BIC, MIC and HQC, and thus significantly improves the bounds provided 
by An et al. [3] [BIC: ©(logn), HQC: ©(log log n)]. On the other hand, 
the setting in An et al. [3] is more general, and it is also shown that the 
estimators are strongly consistent. 

3. Simulation and numerical results. In this section we will perform 
a small simulation study to compare some of the previously mentioned es- 
timators. We will look at the performance in case of AR(6), AR(12) and 
AR(24) processes. The sample size n satisfies n G {125, 250, 500, 1000}; as for 
the dimension dn, we chose the functions dn G {2 log n, 4 log n, 6 log n}, and 
rounded up the values. This implies that the parameter space q £ {0, . . . , K} 
satisfies K e {10, 12, 13, 14}, K g {20, 23, 25, 29}, g {29,34,38,42}. For 
reference, note that { , [^/250l , \V500] , [VTOOO] } = {12, 16, 23, 32}. 
To introduce the estimators qi^ {dn) ,q^^J {dn) , we require some additional no- 
tation. For 1 <k < dn, define {'J* i{k)}i<i<k and {9i{k)}i<i<k via the usual 
relation 

(3.1) efc = f;'$fc. 

The estimators are now defined as 
g</j(fe) = mm{q g n\a~' m^x \{%,{k)a\k))~y^e,{k)\ - 6„) < 

qfHdn)= max qi'^\k). 

Note that the definition of an, bn remains unchanged. This modification sig- 
nificantly improves the performance in practice, which is due to the following 
reason: if one just considers the estimator qi'^}{dn) and hence only the equa- 
tion Qd„ = T'^^^dn, the bias may be quite large since the estimate T^^ 
is rather poor for larger dn- Note that this is also true when computing 
the AIC or related criteria, which is a well-established fact in the litera- 
ture (cf. [2, 17, 23, 25]). Hence one may expect that the "maximum" ver- 
sion tf^J {dn) outperforms its counterpart qit}{dn), which is indeed the case 
in the examples given below. The values for Zn were chosen as z„ g {xn,yn}, 
where x„ satisfies a„x„ + bn = 2.71 for n g {125,250}, a„x„ + bn = 2.91 
for n g {500,1000}. Similarly, we have UnVn + = 3 for n g {125,250}, 
(^nUn + bn = 3.2 for n g {500, 1000}. This means that the estimators get less 
parsimonious when dn increases. Of course an adaption to maintain the same 
confidence level is possible, but the general picture remains the same. 

For the criteria AIC, BIC, HQC and MIC we use the definitions given 
in (1.4) and (2.10); in case of HQC we choose c = 1, since, as pointed out 
by Hannan and Quinn [25], "it would seem pedantic to choose values as 
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c= 1.01 



The following modifications are also considered: 



(3.2) 



AlC(m)* = max{AIC(?n),§<5)(d„)}, 
BlC(m)* = max{BIC(m),§<5)(d„)}, 
HQC(m)* = max{HQC(m),g</jK)}, 
MlC(m)* = max{MIC(m),§<^)(d„)}. 



All simulations were carried out using the program R, in order to get 
a sample of size n, a sample path of size 1000 + n was produced and the 
first 1000 observations were discarded. 

Generally speaking, unreported simulations show that in many cases the 
modified criteria AlC(m)*, BlC(m)*, . . . perform nearly identically as the 

nonmodified ones AIC(?n), BlC(m), This is in particular the case when 

dealing with full parameter sets, that is, 9i ^ 0, 1 <i < q, and Og is suffi- 

ciently large. If this is the case, the performance of the estimators qx„{dn), 

QyJ (dn) is somewhere between the BlC(m) and HQC(m). On the other 
hand, if the model is not full and/or the order q is sufficiently large, then 
the differences can be quite striking. The aim of the following examples is 
to illustrate this behavior. 

3.1. AR(6). First note that the definitions of Xn,yn result in 

P(max|^| < 2.71) > 0.92, P(max |^| < 3) > 0.97, d„G{10,12}, 

P(max|^| < 2.91) > 0.95, P(max |^| < 3.2) > 0.98, d„G{13,14}, 

where ^ = (^i, . . . I'^dn)"^ is a dn-dimensional mean-zero Gaussian random 
vector where the covariance matrix is the identity. 

The results shown in Tables 1 and 2 hint at what is to be expected in case 
of full models, namely that the modifications AlC(m)*, BlC(m)*, . . . perform 
nearly as well as the normal versions AlC(m), BlC(m), .... The estimators 

(^^n), qy}{dn) perform also quite well. 
Contrary to the previous results. Tables 3 and 4 show the difference of 

the modified estimators [and qi}{dn),qi'}{dn)], if the model is very sparse. 
Except for the case n = 1000, the modifications are notably better. 



3.2. AR(12). The definitions of Xn 5 Hn result in 

P(max|^| < 2.71) > 0.85, P(max|^| < 3) > 0.94, dn G {20, 23}, 

P(max|^| < 2.91) > 0.9, P(max|^| < 3.2) > 0.96, {25,29}, 



http : / /portal . tugraz . at/portal/page/port al/TU_Graz/Eiiirichtungen/Institute/ 
Homepages/ i5060/research/R_Code. 
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Table 1 

Simulation of an AR(6) process with coefficients 06 = (0.1, —0.3, 0.05, 0.2, —0.1, 0.2)"^ , 
e^J\f{Q,l), 1000 repetitions, d„ £ {10,12} 



n 


Q 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






125 


<5 


428 


427 


943 


808 


746 


704 


550 


545 


816 


701 




5 


65 


65 


10 


30 


32 


40 


58 


58 


28 


41 




6 


344 


341 


45 


143 


191 


214 


295 


294 


137 


196 




7 


66 


65 


1 


5 


23 


24 


54 


53 


5 


14 




<7 


97 


102 


1 


14 


8 


18 


43 


50 


14 


48 


250 


<5 


93 


89 


693 


432 


328 


282 


202 


188 


440 


299 




5 


24 


23 


14 


32 


32 


32 


33 


31 


42 


38 




6 


646 


632 


287 


481 


586 


595 


649 


634 


467 


543 




7 


96 


95 


5 


8 


37 


35 


74 


73 


4 


9 




>7 


141 


161 


1 


47 


17 


56 


42 


74 


47 


111 



where ^= {£,1, ■ ■ ■ ,^d„) is a (i„-dimensional mean-zero Gaussian random 
vector where the covariance matrix is the identity. 

The results are depicted in Tables 5,6,7 and 8, and are quite similar to the 
case of the AR(6) processes. If the model is rather full, AIC(?ti)*, BlC(m)*, . . . 
perform nearly as well as the normal versions AIC (m), BIC (m), . . . , whereas 
in case of the sparse model, a significant difference can be observed. 

3.3. AR(24). In this case, the definitions of Xn,yn result in 

P(max|^| < 2.71) > 0.795, P(max|^ | < 3) > 0.912, d„ G {29, 34}, 

P(max|^| < 2.91) > 0.86, P(max|^| < 3.2) > 0.94, d„ G {38, 42}, 

where ^ = {^1, ■ ■ ■ ,^dn)'^ is a d,i-dimensional mean-zero Gaussian random 
vector where the covariance matrix is the identity. The behavior shown in 



Table 2 

Simulation of an AR(6) process with coefficients Os = (0.1, —0.3, 0.05, 0.2, —0.1, 0.2)"^ , 
e ~ A/'(0, 1), 1000 repetitions, dn € {13, 14} 



n 


5 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 


^(5) 


5.(5) 


500 


<5 


1 


1 


177 


75 


29 


25 


15 


15 


86 


52 




5 


3 


3 


9 


11 


6 


6 


3 


3 


17 


14 




6 


730 


713 


805 


874 


913 


889 


892 


867 


865 


849 




7 


108 


108 


8 


8 


42 


42 


57 


57 





2 




<7 


158 


175 


1 


32 


10 


38 


33 


58 


32 


83 


1000 


<5 








3 

























5 


































6 


724 


709 


990 


951 


952 


917 


934 


901 


955 


885 




7 


103 


101 


7 


9 


36 


34 


47 


44 


5 


7 




>7 


173 


190 





40 


12 


49 


19 


55 


40 


108 
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Table 3 

Simulation of an AR(6) process with coefficients Oe = (0.1, 0, 0.05, 0, 0, 0.2)"^ , e ^ Af{0,l), 

1000 repetitions, d„ e {10, 12} 



n 




AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 


iff2 




125 


<5 


719 


699 


998 


854 


944 


842 


839 


787 


854 


747 




5 


11 


11 








2 


2 


7 


7 





11 




6 


168 


181 


2 


124 


43 


126 


107 


145 


124 


184 




7 


44 


44 





4 


8 


11 


23 


24 


4 


8 




<7 


58 


65 





18 


3 


19 


24 


37 


18 


50 


250 


<5 


290 


276 


960 


437 


723 


424 


550 


396 


438 


321 




5 


6 


6 





3 


2 


3 


5 


5 


3 


5 




6 


491 


488 


39 


513 


245 


503 


376 


494 


513 


573 




7 


91 


90 


1 


2 


21 


21 


40 


40 


1 


7 




>7 


122 


140 





45 


9 


49 


29 


65 


45 


94 



Tables 9, 10, 11 and 12 is as in the previous two cases. The difference in the 
sparse model is perhaps the most striking one. 

4. Proofs and ramification. In this section, we will prove Theorems 2.2, 
2.8, 2.12, and also explicitly mention some auxiliary results which have 
interest in themselves. For dn < m let = [li j)i<i,j<m be the inverse 
of the covariance matrix — (T2,j)i<i,j<rri associated to the AR,((i7i) pro- 
cess {Xk}kez- Due to Galbraith and Galbraith [21], it holds that 

a dn+i-j 

(4.1) a^-flj = "^er9r+j-i- Mr+j-u l<i<j<m, 

r=0 r=p 



Table 4 

Simulation of an AR(6) process with coefficients Oe = (0.1,0,0.05,0,0,0.2)^, e~A/'(0,l), 

1000 repetitions, d„ G {13, 14} 



n 


5 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 




^(5) 


500 


<5 


21 


21 


761 


102 


267 


98 


164 


85 


102 


56 




5 








1 




















1 




6 


663 


655 


234 


871 


675 


822 


736 


796 


874 


863 




7 


125 


124 


4 


3 


50 


49 


69 


68 





10 




<7 


191 


200 





24 


8 


31 


31 


51 


24 


70 


1000 


<5 








168 


1 


3 


1 


1 


1 


1 







5 


































6 


702 


683 


822 


949 


940 


905 


919 


887 


955 


898 




7 


121 


119 


9 


9 


43 


42 


52 


52 


3 


9 




>7 


177 


198 


1 


41 


14 


52 


28 


60 


41 


93 
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Table 5 

Simulation of an AR(12) process with nonzero coefficients Oi = 0.1, 63 = —0.4, O5 = 0.5, 



07 = 


-0.1, 6 


's = 0.05, 


, ^10 = 


-0.3, ( 


9i2 = 0.2, 


£~A/'(0, 1), 1000 repetitions, d„ 


G {20,23} 


n 


Q 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






125 


<11 


705 


701 


995 


966 


931 


917 


812 


807 


969 


929 




11 


79 


79 


2 


3 


22 


22 


54 


54 


1 


2 




12 


141 


141 


3 


23 


40 


47 


97 


98 


22 


47 




13 


48 


48 





4 


6 


9 


30 


30 


4 


11 




>13 


27 


31 





4 


1 


5 


7 


11 


4 


11 


250 


<11 


257 


257 


854 


730 


573 


560 


423 


421 


748 


620 




11 


39 


39 


9 


10 


31 


31 


39 


39 


3 


11 




12 


495 


493 


135 


247 


349 


356 


442 


441 


237 


313 




13 


115 


115 


2 


4 


40 


40 


65 


65 


3 


13 




>13 


94 


96 





9 


7 


13 


31 


34 


9 


43 



where 

Q = min{i — 1, (i„ + i — j, m — j}, (3 = max{i — 1, m — j'}, 

and either of the sums is taken to be zero if its upper hmit is less than its 
lower limit. The second sum is zero unless m — dn + l<i<j<dn while 
both sums are zero \i j — i > dn- Note that this implies a'^{m)j^ m, — ^ ^'^^ 
m> dn, and in particular that 

(4.2) sup snv\lli+h\ = o((logn)-i), 

|fe|>n i 

if Assumption 2.1 is valid. Throughout this section and particularly in the 
proofs of the presented results, we use the notation (dn)- Note that 

Table 6 

Simulation of an AR(12) process with nonzero coefficients 9i — 0.1, 63 = —0.4, 85 = 0.5, 



07 = 


-0.1, Oi 


3 = 0.05, 


610 = 


-0.3, 612 


= 0.2, 


e~A/'(0, 1), 1000 repetitions, dn 


e {25,28} 


n 


Q 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






500 


<11 


19 


19 


367 


256 


110 


106 


75 


73 


269 


183 




11 


4 


4 


4 


4 


6 


6 


6 


6 


2 


2 




12 


684 


680 


618 


705 


808 


793 


808 


797 


702 


758 




13 


129 


128 


10 


12 


63 


62 


78 


76 


4 


8 




>13 


164 


169 


1 


23 


13 


33 


33 


48 


23 


49 


1000 


<11 








11 


2 














2 


1 




11 


































12 


679 


676 


970 


947 


925 


900 


896 


873 


958 


914 




13 


151 


150 


17 


17 


61 


60 


79 


78 


6 


13 




>13 


170 


174 


2 


34 


14 


40 


25 


49 


34 


72 
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Table 7 

Simulation of an AR(12) process with nonzero coefficients 9i = 0.1, 9s = —0.4, 612 — 0.2, 
e ^ Af{0, 1), 1000 repetitions, d„ G {20, 23} 



n 


Q 


AIC 


AIC* 


BIG 


BIG* 


HQG 


HQG* 


MIG 


MIG* 




^(5) 


125 


<10 


884 


853 


1000 


920 


995 


920 


963 


910 


920 


861 




11 


3 


3 














1 


1 





3 




12 


68 


94 





71 


5 


71 


25 


70 


71 


114 




13 


11 


13 





3 





3 


4 


7 


3 


5 




>13 


34 


37 





6 





6 


7 


12 


6 


17 


250 


<10 


509 


421 


999 


555 


934 


552 


792 


530 


555 


424 




11 


3 


3 





3 





2 


2 


3 


3 


4 




12 


340 


419 


1 


421 


59 


419 


170 


416 


421 


514 




13 


67 


68 





2 


4 


6 


18 


19 


2 


5 




>13 


81 


89 





19 


3 


21 


18 


32 


19 


53 



we can rewrite the equation defining the AR{dn) process as 
(4.3) Y = X*rf„+Z, 

where Y = {Xi, . . . , X^)^ , Z = (ei, . . . , En)'^ , and the nxdn design matrix X 



IS given as 



X 



/ ^0 
Xi 

\Xn-l Xn 





■ ^1- 


-dn 


Xo ■ 


• X2. 


-dn 


Xn-2 • 


• Xn 


-dn 



Table 8 

Simulation of an AR(12) process with nonzero coefficients 61 — 0.1, ^3 = —0.4, 612 = 0.2, 
e~A/'(0, 1), 1000 repetitions, dn £ {25,28} 



n 


5 


AIG 


AIG* 


BIG 


BIG* 


HQG 


HQG* 


MIG 


MIG* 






500 


<ii 


77 


58 


983 


125 


613 


125 


402 


115 


125 


78 




11 











2 





2 





1 


2 


1 




12 


663 


678 


17 


858 


360 


834 


532 


808 


858 


870 




13 


104 


103 





3 


15 


16 


39 


40 


3 


4 




>13 


156 


161 





12 


12 


23 


27 


36 


12 


47 


1000 


<11 








689 


2 


67 


2 


35 


2 


2 


2 




11 


































12 


706 


701 


307 


971 


880 


926 


893 


907 


972 


936 




13 


124 


123 


2 


2 


39 


38 


54 


53 


1 


3 




>13 


170 


176 


2 


25 


14 


34 


18 


38 


25 


59 
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Table 9 

Simulation of an AR(24) process with nonzero coefficients 9\ = 0.6, 62 = —0.1, 64, — 0.05, 





6»7 = 0.15, 




-0.27, 6 


'10=0.1, 


dl2 = 


-0.2, 015 


= -0.25, 


^18 = 


= 0.05, 020 


= 0.1, 






021 


= -0.3, 624 = 


0.17, 


■^A/'(0, 1), 1000 repetitions, dn 


e {29,34} 






n 


Q 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






125 


<23 


972 


970 


1000 


996 


1000 


996 


992 


990 


996 


989 




23 


12 


12 





1 





1 


5 


5 


1 


2 




24 


3 


3 





1 





1 


1 


1 


1 


6 




25 


10 


10 














2 


2 





1 




>25 


3 


5 





2 





2 





2 


2 


2 


250 


<23 


518 


516 


995 


923 


872 


840 


727 


717 


924 


845 




23 


120 


120 


2 


13 


48 


50 


77 


78 


12 


25 




24 


185 


186 


3 


57 


67 


90 


135 


138 


57 


98 




25 


89 


89 





1 


7 


8 


38 


38 


1 


10 




>25 


88 


89 





6 


6 


12 


23 


29 


6 


22 



We have 

n n 
fc=l k=l 

where = {vl^^\ . . . ^vjf'^^f , = ([/^^\ . . . , [7^'^"^)^. The following re- 
sults are key ingredients. 

Lemma 4.1. Let {X^jfeg^ he an AR((i„) process, such that Assump- 
tion 2.1 is valid. Then 

^(iir,; - f;:iu > (iogn)-».)=o( '^"°;,f >' ). >o. 

Table 10 

Simulation of an AR(24) process with nonzero coefficients 0\ = 0.6, 02 = —0.1, 0i = 0.05, 



( 


97 = 0.15, 


^8 = 


-0.27, 6 


'10=0.1, 


612 = — 


0.2, 6I15 


= -0.25, 


^18 = 


= 0.05, 020 


= 0.1, 






621 


= -0.3, 024 = 


0.17, er- 


-AA(0,1 


), 1000 repetitions, rf„ 


G {38,42} 






n 


5 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






500 


<23 


63 


62 


716 


545 


302 


288 


210 


205 


589 


430 




23 


38 


38 


55 


60 


87 


87 


85 


85 


58 


71 




24 


513 


512 


208 


357 


490 


500 


525 


526 


326 


437 




25 


192 


192 


18 


28 


93 


93 


129 


129 


19 


27 




>25 


194 


196 


3 


10 


28 


32 


51 


55 


8 


35 


1000 


<23 








81 


30 


6 


5 


3 


3 


42 


18 




23 








34 


31 


8 


7 


6 


6 


48 


35 




24 


562 


552 


835 


857 


796 


775 


761 


741 


868 


842 




25 


197 


195 


48 


45 


140 


137 


160 


156 


7 


24 




>25 


241 


253 


2 


37 


50 


76 


70 


94 


35 


81 
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Table 11 

Simulation of an AR(24) process with nonzero coefficients 61 = 0.6, 82 = —0.1, 64 = 0.05, 
010 = 0.1, 012 = -0.2, 6*24 = 0.17, er^Af (0,1), 1000 repetitions, d„€ {29,34} 



n 


Q 


AIC 


AIC* 


BIC 


BIC* 


HQC 


HQC* 


MIC 


MIC* 






125 


<23 


1000 


991 


1000 


991 


1000 


991 


1000 


991 


991 


969 




23 





2 





2 





2 





2 


2 


6 




24 





6 





6 





6 





6 


6 


20 




25 





1 





1 





1 





1 


1 


1 




>25 





























4 


250 


<23 


857 


768 


1000 


817 


998 


817 


986 


815 


817 


702 




23 


1 


15 





27 





26 





25 


27 


39 




24 


99 


166 





142 


2 


143 


13 


145 


142 


225 




25 


20 


22 





3 





3 





3 


3 


5 




>25 


23 


29 





11 





11 


1 


12 


11 


29 



Lemma 4.2. Assume that the assumptions of Theorem 2.2 are valid. 
Then we have 



n 



k=l 



> (logn)"^i =c(l), 



where 1 < xi • 



Lemma 4.3. Assume that the assumptions of Theorem 2.2 are valid. 
Then: 



(i) lim PI max a ^ 

n— >-oo \ l<h<dn 



(h) 



k=l 



<Un = exp(-exp(-x)), 



Table 12 

Simulation of an AR(24) process with nonzero coefficients 61 = 0.6, 62 = —0.1, 64 = 0.05, 
010 = 0.1, 012 = -0.2, 024 = 0.17, e^Af{0, 1), 1000 repetitions, d„ G {38,42} 

n q AIC AIC* BIC BIC* HQC HQC* MIC MIC* q*,^' 



500 


<23 


351 


270 


1000 


383 


952 


380 


854 


379 


383 


256 




23 


2 


8 





51 





48 





41 


51 


61 




24 


451 


522 





547 


45 


550 


130 


545 


547 


637 




25 


74 


73 








3 


3 


13 


13 





2 




>25 


122 


127 





19 





19 


3 


22 


19 


44 


1000 


<23 


10 


6 


986 


15 


440 


15 


280 


15 


15 


3 




23 











14 





13 





11 


14 


12 




24 


718 


715 


14 


941 


522 


908 


659 


887 


941 


905 




25 


121 


118 





3 


32 


31 


46 


45 


3 


8 




>25 


151 


161 





27 


6 


33 


15 


42 


27 


72 
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(ii) Vn\\^d„ - ^djoo = Op{y/logdn), 

where Un = anZ + bn, an,bn,z are as in Theorem 2.2. 

The proofs of Lemmas 4.1, 4.2 and 4.3 are given in Section 5. Based on the 
above results, one readily derives the following weak version of Theorem 2.2. 

Corollary 4.4. Assume that the assumptions of Theorem 2.2 are valid. 
Then for z € M 



)\-bn)<z) ^exp(-e ^), 



p(a-i(V^ max a^y^^e, - 

\ \ l<h<dn 

where an and bn are as in Theorem 2.2. 



Corollary 4.5. Under the same conditions as in Theorem 2.2, we have 

Throughout the proofs, the following inequality will be frequently used. 
For random variables Xi, . . . ,Xg, and e > 0, the inequality between the ge- 
ometric and arithmetic mean implies 

(4.4) pmi^^i^^) <E^(i^^i^^'^')- 

Proof of Corollary 4.4. It holds that 



max a 

l<h<d„ 



< 



1 



-1/2 



^h) 



(h) 



k=l 



-1 



max 

l<h<dn 



{h) 



k=l 



Since inf/j7^ h ^ *-* choosing xi > 1; the claim follows from Lemmas 4.2 
and 4.3. □ ' 

Proof of Corollary 4.5. Trivially, it holds that 



2 

a —a 



bo-cPo + @d„^dr.-®d„^d„ 

6o-(^o + (el-0jj(*d„-*d„ 



By Corollary 4.4 and Lemma 4.3 we have 

= C'p(n-^/2 log?i). 
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Similarly, we obtain from Lemmas 4.3, 5.2 and Assumption 2.1 

Op{n~^/^ logn), 

)o| = C'p(n-i/2logn). 



(©I 



Moreover, from the above one readily deduces l^o 
Piecing everything together, the claim follows. □ 



Proof of Theorem 2.2. Due to Corollary 4.4, it suffices to show that 
the error difference 



(4.5) 



max Aj 

l<i<dn 



max 



l<i<dn 

Op{{logn)-^^) 



for some xi > 1- Note that per assumption we have that (logn)^^^^ ^^"^dn = 
0(1) for some X2 > 1- Moreover, 

max A,< max | ((tm?')'/' - (7m^')'/')(7m?')-'/'| 
xV^ max \(e,-9.Myr'/\ 

0<l<dn 

Corollary 4.4 gives us ^/nmayi()<i<dJ(Oi - di){^*ia'^)~^^'^ \ = Op{logn), hence 
we need to study \{j*,a'^)^^^ - (7*,fT2)i/2|(^* .52)-i/2^ g-^^^g 



^* ^2 * 2 



< 



h.i 



it suffices to treat (74*40"^ — 7j*,j<7^)(7j*jS'^) ^ . For e = logn we have 

^{k'7M-^'T;;J>£7M^V2} 
Since cr^,^ii > C > 0, we have from Lemma 4.1 that for 1 <xi < X2 



P{ max |(7^7*,- — o"^7*,| > logn min 
\0<i<d„ ' ' l<i<d„ 



7*,)=0(logn>^^Pn-^'/2<). 
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In order to treat |o"^7j*j — S'^7j*j|, note that 

I 2^* ^* ^2 1 ^ /I 2 ^2 1 1 * ^* I I * I 2 ^2 1 \ 

max \o %^,--ii^i(T |< max (|cj -a | |7i,i - 7»,»l + 7i,ik - l)> 

which by virtue of Corollary 4.5 and Lemma 4.1 is of the magnitude Op{n~^/'^ x 
logn). We thus obtain that 

P[ max Aj > log n~'^M = of 1) 

for some xi > 1; which completes the proof. □ 

Proof of Corollary 2.5. First note that both conditions (i) and (ii) 
imply that \ai\ = 0{p~^), < /o < 1 (cf. [17]). Hence Remark 2.3 yields that 
we may choose dn = 0{n^), < 5 < 1/2. Now assume that (i) holds. Then 
relation (4.1) implies 

dn dn 

i=l i=l 

whence the claim. If (ii) holds, then for large enough n we obtain similarly 

^'i^f7M>E^'-E^'^E^'^i' 

j=0 i=/3 i=0 

where a, /3 are as in (4.1). □ 

We are now ready to prove Theorem 2.8. 

Proof of Theorem 2.8. Let go = 5' be the true order of the AR(g)- 
process {Xk]k&, put 

\n = a-\^\{%,d^r^'\e, -e,)\- 6„,) 

and assume first that /c G N, A; > 0. Note that = for i> q. Then we have 
that 

P{Qz„ = k + q) = P( {Oq+k,n > Zn}ri\ max Oi^n <Zn\) 

\ '~k+q+l<i<dn J ' 

= P[ max 'Bi n <Zn \ - P( max 0i n < Zn 

\k+q<i<dn / \fc+<j+l<'t<d„ 

Due to Theorem 6.1, we can approximate the sequence {9i,n}i<i<d„ by 
a suitably transformed corresponding sequence of mean-zero Gaussian ran- 
dom variables = {£,n,i, ■ ■ ■ , Cn,dn)'^ with covariance matrix . Let t]^^ = 
(?7n,i, • • • ) f/n,d„)"^ be another sequence of i.i.d. mean-zero Gaussian random 
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variables with unit variance. Following Deo [18], we obtain from max|r|^ 
r^^l = o((i-^) that for fixed / G N 

P( max a~^{\S,n,i\ -bn) < Zn) - P( max a''^ {\rin,i\ - bn) < Zn 

\q+l<i<dn ' \q+l<i<d„ 



<C \Pi,j\idn 
l<i<j<d„ 



Imitating the technique in Berman [9] , we obtain that the above quantity is 
of the magnitude o{dn ^"~'"^''/^). This yields 

Piqz„ = k + q) 

= P[ , niax a':^^{\rin4-bn)<Zn) 

\q+k+l<i<dn / 

-P( max a-Wr^nA-bn)<Zn)+o{n-'' + 4-'^'+i)n) 

\q+k<.i<d„ / 
= P{a~\\7]n,l\ - fen) < Z^f^-'^-'il - P{a-\\r]n,l\ - bn) < Z„)) 

From the definition of a„,6„, and since z„ — )• oo, we obtain that (Deo [18]) 

(4.6) limP{a-H\7jn,i\ - bn) < Zn)''""'^-^ ^ 1, 

n 

(4.7) PKWr^nM - bn) >^n) = ^ ^ j ' 

This yields 

(4.8) P(g,„ =k + q) = ^ + oi^-^ + 4"^"+'^/' J 
and in particular 

d„ 

(4.9) Piq,^ >Q) = Y1 = k + q) = e"^" + 0(6"^" + d-'"+^), 

k=l 

and per assumption the right-hand side goes to zero as n increases. We now 
consider the case P{qzn < q)- To this end, let /c G N, /c > 0. Then we have 

PiQz^ = q-k)< P(eg_k,n < Zn) 

= P{an^{\U,q-k + VnOq_k\ " &n) < Zn) 
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Since > 0, one readily verifies by known properties of the Gaussian 

c.d.f. that P{a~^{\£,n,q~k + VnOq^kl - K) < Zn) = 0{n^''), and hence 

(4.10) P{q,^=q-k) = 0{n-'') 
and in particular 

(4.11) P{q,„<q) = O{dnn~'')^0 

as n increases. This together with (4.9) establishes consistency. □ 

Proof of Theorem 2.12. Let go = 9 be the true order of the AR(q)- 
process {Xk}k&- The proof then consists of two parts. It is first shown 
that P{q^ <(/)—)■ 0, whereas in the second part the claim P{q^ > g) — )• is 
established. 

First note that Lemma 5.2 and the Cauchy interlacing theorem yield 
that ||rfc||oo, ||r^^||oo < C < oo, uniformly for 1 < A; < dn- Hence, using that 

= 0fc, Lemma 4.3 and a slight adaption of Lemma 4.1 imply that 
|S'^(/c) — cr^(A;)| = Op{l) uniformly for 1 < A; < g. 
Since inf/j |1 — > 0, we conclude that inffc o"^(/c) > and hence 

(4.12) \\og{a\k))-\og{a\k))\=o^{l). 
By Hannan [23], Chapter VI, it holds that for A; G N 

k 

(4.13) \og{d\k)) = iog0„,o + j;iog(i - e]{k)). 

Then, arguing as in Hannan and Quinn [25], we have due to Cn = o{n) that 
for large enough n 

fn{k) = \og{a\k)) + n-'Cnk 

is a decreasing function in k for < k < q, and strictly decreasing for 
q — 1 < k < q (since 6'^ > 0) with probability approaching one. This implies 
that eventually q^>q, hence it suffices to establish that the probability of 
overestimating the order goes to zero as n increases, that is, 

(4.14) limP(argmin(log(a2(fc)) + rT^Cnk) > g + l) = 0. 

" ^ q<k<d„ ' 

Using the same arguments as in [3], it follows that it suffices to establish 

(4.15) limp(^ max ^ - log(l - ^(A;)) - n-^C^fc^ > 0^ = 0. 
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By Theorem 2.2, we have that 

(4.16) ||0fc||oo =C'p(n~Mogd„) foi qo<k<dn. 
This impHes that for some increasing Xn — ?• oo, we obtain that 

k+q 

(4.17) - log(l - e]{k)) < kxnn~hogdn, 

j=l+q 

with probabihty approaching one. Since log(i„ = o{Cn) per assumption, 
(4.15) follows, which completes the proof. □ 

5. Proofs of the auxiliary results of Section 4. The following result is 
required for the proofs. 

Lemma 5.1. Let {Xk}kGZ be an AR{q) process such that Assumption 2.1 
is satisfied. Then: 

(i) E/^olCov(Xfc,Xfc+/,)| < oo, 

(ii) V^Un,h-Mp = 0{l), p>l. 

Proof. Both properties (i), (ii) follow from Assumption 2.1 via straight- 
forward computations (cf. [17, 23]). □ 

Recall the notation P^ = i7i,j)i<i,j<m and P;;^^ = (7i'j)i<j,i<m for the 
covariance matrix and its inverse. 

Lemma 5.2. Assume that Assumption 2.1 holds. Then for dn <m we 
have ||Pm,||oo, ||rm^||oo < C* < oo, uniformly in m. 

Proof. Using relation (4.1) and the corresponding notation, one ob- 
tains 



m 

I r;;^ 1 1 oo = o-"^ , max V | (7^7^ - , 



o 

< 2(7 max 

l<j<m 



a dn+i—j 
r=0 r=P 



m 



\h\<mr=0 \r=0 / 

where 9h = for h < 0. Due to Assumption 2.1, the above expression is 
finite, hence the first claim follows. In order to establish the result for Pm, 
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note that 

m oo 

||rm,||oo= max ^^174^1 <2y'|(^/j| <oo 

l<j<m^ — ' ^ — ' 
i=l h=0 

by Lemma 5.1(i), which yields the claim. □ 

We can now prove Lemma 4.3, which we reformulate below for the sake 
of readability. 

Lemma 5.3. Suppose that inf/^ \ hi ^ ^ ^'^^ Assumption 2.1 holds. Then: 



(i) lim P\ max a ^ 

n—>-oo \ 0<h<dn 



(h) 
k 

k=l 



<Un = exp(-exp(-x)), 



(ii) Vra 1 1 - *d J I oo = Op ( \/log C?n ) • 

Proof. We will first show (i). Using the notation established in Sec- 
tion 4, we have 

/ dn oo \ OO 

tW _ ^ I ST J*)' 



\j=l i=0 / r=l 



where a*^ = Yl{i>o,j>o,i+j=r}lh,j'^i- Let < 6 < 5* , and put m„ 
Then it follows from Lemma 5.2 that 



oo oo 



sup X Kh\ <C = - ^")"'') = ^("^n'^)- 



r—rrin i=mri—dn 



Due to Assumption 2.1, one may thus repeat the (quite lengthy) proof of 
Theorem 1 (see also Remark 2) in [48] to obtain the result. In fact, the 

present case is easier to handle, since {ujf^^}keN is a martingale sequence. 
Assertion (ii) follows directly from Theorem 1 in [48]. □ 

We can now proof Lemma 4.1, which we restate for the sake of readability. 

Lemma 5.4. If Assumption 2.1 holds, we have for xi > 

■(c?„(logn)xi)P 



P(||r. -T- \\^>{\ogn)-^-)=0 



p/2 



Proof. We introduce the following abbreviations. Put 

= ll^dn^lloO' = ll^^rfn ~ ^dn^lloO' G = \\T - T dJ\cO ■ 

Due to the stationarity of {Xk}^^^ it follows that 

(5-1) G= \\fd„ - Td„\\oo < 2 X Vi>n,\h\ - <t>\h\\, 

h<d„ 
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and thus an application of the Holder and Minikowski inequalities yields 
(5.2) E(|G|)<2^||^„,|,|-0|,|||p. 

h<d„ 

Due to Lemma 5.1(ii) we have y/n\\(l>n.\i~j\ ~ 4'\i~j\\\p ^ Cp ^or some finite 
constant Cp, thus the Markov inequality in connection with Minikowski's 
inequality implies 

(5^3) F(||f,. - r,J|„ > (,og„)-v.) = o((M|i£l)!). 

Due to the sub-multiplicativity of the matrix norm || * ||oo; proceeding as 
in Lemma 3 in [7] one obtains 

F<{E + F)GE, 

and in particular if EG < 1 

F<E^G/{l-EG). 

Since we have E < oo due to Lemma 5.2, we deduce that for sufficiently 
large n 

P{F > e) < P{G > (logn)-i) + P{G > E^/2e). 
Choosing e = (logn)"^^ , the claim follows. □ 

We are now in the position to show Lemma 4.1. Recall that we have 

(5.4) Y = X*d + Z, 

where Y = {Xi, . . . ,Xn)'^, Z = (ei, . . . ,£„)"'", and X is the n x dn design 
matrix. _ _ _ 

We introduce the estimator @ = {9i, . . . , 9^)'^ via 

(5.5) = (X^X)~^X^Y. 

Remark 5.5. It is evident from the proof that Lemma 5.4 remains valid 
if one replaces with n(X^X)~^, which in fact is the better estimator. 

Proposition 5.6. Let {Xk}k&z be an AR((i„) process, such that the 
assumptions of Theorem 2.2 are satisfied. Then 

P{\\V^{® - 0)|L > (logn)-^') = 0((logn)»P/2^-P/^</4+^) + o(l). 

Proof. Following the proof of [17], Theorem 8.10.1, we have the follow- 
ing decomposition: 

-&) = ^A^f ;„'($,„ - n-^X^Y) + n'/\f-' - n{X.^JQ)-')n-VY. 



26 



M. JIRAK 



For the ith component of ^/n{^dn ~ ''^ ^^"^Y), which we denote with Tj, 
we have 

/ n—i \ 



n 



-1/2 



k+i 



k=l- 



k=l 



Using the Minikowski and the Cauchy-Schwarz inequahties we get 

/ n~i 

^-1/2 ^ XkXk+i + V^n{il-n~h)Xn-n-'Y,{Xk + Xk+i 



k=l 



p/2 



< 



l-i\ 



n 







11 -il 



-1/2 



fc=l-i 







-1/2 



+ n 

p/2 k=l-i 



-|- 1 1 1 1 p ^ll^nllp ■ 

- ^r) ■ 



n 



-1/2 



n 



-1/2 



fc=l 



Since < i < d„, we obtain from Lemma 5.1 that = 0{n ^^"^d^"^), and 
hence by the Markov inequahty 



(5.6) 



n-^yJY)\\^>e) 



<^P{\Ti\ >e) = 0{e-P/'^n-P/^<Fj^+^). 



1=1 



(5.7) 



Put Bn = — n ""^X^Y. Then by adding and subtracting T ^ ^ we obtain 

P(V^||f ;>„||oo >e)< P{M\(^2 - r^„')^nlloo > e/2) 

+ P(V^||r^XlU>e/2). 
In order to control the first expression, note that 

P{M\{K - ^dl)Bn\\oo > e/2) < P{\\r;' - T~^\U > e/2) 

+ Pi\\Bn\\oo>e), 

which by Lemma 5.4 and (5.6) is of the magnitude 0{e~^^'^n~^^'^dn^^~^^)- 
Moreover, sincG ||r^ \\oo by L6iiiina, 5.2, the bound in (5.6) implies 

that for some C > 

P{V^\\T-]BJ^ > e/2) < P{V^\\BJ^ > eC-i) = 0{e-P/^n-P/'dPj'+'), 

hence we conclude that 

(5.8) P(|| V^r-\$,„ - n-iX^Y)||^ >e) = O^e-^l^n^I'd^^'^^). 
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We will now treat the second part, which we rewrite as 

ni/2(f - n(X^X)-i)(n~iX^Y - n-^E(X^Y)) 
+ nV2(f-^-n(X^X)-^)n-iE(X^Y) 

= : Cn + Dn ■ 

Due to Lemma 5.3 (requires an easy adaption), we have 

(5.9) \\n-'^/VY - n-i/2]E(x^Y)||oo = Op{\ogn). 
Moreover, it holds that 

V^{V~l - n(X^X)-i) = f ;JV^(n-i(X^X) - f ,Jn(X^X)-i, 
and thus the sub-multiplicativity of the matrix norm || * iniplies 

||C„||oo<||r;„'||oo||V^(n-i(X^X)-f,J||^||n(X^X)-i|U 

X ||n-V2x^Y-n-i/2E(X^Y)||oo. 
Using (5.9) we thus obtain 

i^(||C„,||oo>e) 

<o(l) + P(||f;;|U||^A^(n-i(X^X)-f,J|L 
X ||n(X^X)^^||oologn > e). 

Put A„ = n~^(X-^X) — Td^. By adding and subtracting F^^ and using 
Lemma 5.4 (see Remark 5.5) and Lemma 5.2 we obtain 

P(||f;^'||oo||A„||oo||n(X^X)-i|Ulogn>e) 

< 2P(||A„|Ulogn > 1) + P{\\V~l - f ,„'||oo > e) 

+ P(||r,-^-n"i(X^X)|U>e). 
Choosing e = (log?i)~^\ Lemma 5.4 and (5.6) thus yield the bound 

(5.10) P(||f;J|U||A„|Ulogn>(logn)->^i) = 0((logn)>^iPn-P/X/')- 
Piecing everything together, the claim follows. □ 

We are now in the position to proof Lemma 4.2. 

Proof of Lemma 4.2. We have that 
P{\\n^/^{@ - 0) - n-V2r-ixTz||^ > 2e) 

< P(||ni/2(0 - 0)11^ >e)+ P(||nV2(0 _ 0) _ n-'/^T~Vz\\^ > e). 
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Setting e = logn ^'^ , Xi > 2, Proposition 5.6 implies that 

||ni/2(0-0)|U=Op(logn-»). 
Moreover, the proof of Proposition 5.6 gives us 

(5.11) nV2(0 _ 0) _ n~^I^V-VZ = (n(X^X)-i - V~^)n-^I^^^Z, 
and hence Remark 5.5 and Lemma 5.3 imply that 

||ni/2(0_0)_„-i/2r-ix^Z|U 

(5.12) < \\n{yJX)'^ - r"i||oo||n~^/2x'^Z||oo 

which completes the proof. □ 



6. Gaussian approximation. In this section we obtain, under suitable as- 
sumptions, a normal approximation for the quantity n~'^^'^T~^'K^Ti^ where 
we use the notation introduced in Section 4. This entitles us to obtain 
a quantitative version of Theorem 2.2 under stronger conditions. Let 
(^fc-i) 



^Xk-dj'sk, kGN.We have 



where Y, = {V^^\...,V^''"Y 



n 



-1/2 



n 



-1/2 



k=l 



,, = ^v^ r , Ufe = {Ul^\. . . , Ul""')'''. Note that and 

are both martingale sequences. In particular, it holds that E(Vfc) = E(Ufc) = 
and 



*;=1 



r(rfn)\T 



(6.1) 



K(V,Vf+,) 



if /i = 0, 
if /i / 0, 



since is independent of {^fc_i}j>i. Throughout this section, we will always 
assume that dn = 0{n). 

The main theorem is formulated below. 

Theorem 6.1. Suppose that Assumption 2.7 holds. If dn = 0{n^) with 
6 < 1/7, then on a possible larger probability space, there exists a dn-dimen- 
sional Gaussian random vector Z with covariance matrix Vz, such that 



P\ 



n 

fc=l 



where = ^/n{\ogn) ^'-^ , for arbitrary v, xs > 0, and max ||n ^Tz — cr^Td^ \ 
o{d~'). 
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Remark 6.2. If one succeeds in establishing a quantitative version of 
Lemma 21 in [48] with an appropriate error bound, corresponding results 
to Theorem 6.1 with < 5 < 1 should be possible. This, however, is beyond 
the scope of the present paper. 

The proof of Theorem 6.1 partially follows [8], Theorem 4.1, and is based 
on a series of lemmas. To this end, we require some preliminary nota- 
tion. For a (i-dimensional vector x= (xi, . . . ,Xd), we denote with |x|rf = 
(Sr=i('^*)^)^'^^ the usual Euclidean norm. The following coupling inequality 
is due to Berthet and Mason [10]. 

Lemma 6.3 (Coupling inequality). Let Xi, . . . , Xjsf be independent, mean- 
zero random vectors in M", n>l, such that for some B > 0, \Xi\n < B , 
i = 1,...,N. If the probability space is rich enough, then for each 6 > 0, 
one can define independent normally distributed mean-zero random vectors 
(,i,...,£,N with and Xi having the same variance/ covariance matrix for 
i = 1, . . . , N , such that for universal constants Ci > and C2 > 0, 

( ^ 



P< 



i=l 



>S} < C\r? exp 



Cob 



Bri^ 



The proof of Theorem 6.1 is based on a blocking argument, which in turn 
requires carefully truncated random variables. Put 



n 



-i/2r-ix^z = n-i/2r-i^v, 



n 



'1/2 



k=l 



where = (C/^ , . . . ,Uf^ " . Note that and are both martingale 
sequences. 



Lemma 6.4. Suppose that Assumption 2.7 holds. Then for q > 3: 

fc=i 



n 



n 



0(n 



(ii) P(V^||$rf„-*rfJ|oo> V^log^) = 0(n-'^) 
for arbitrary > 0. 

Proof. We first show (i). By Lemma 1 in [47] we have 



n 



-1/2 



k=l 



>\/glogn <E^| 



n 



{h) 

k 



h=l 

0{dnn-') 



k=l 



> Vglog 



n 
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for arbitrary i' >0, hence the claim. Part (ii) can be shown in the same way, 
using Theorem 3 in [47] instead of Lemma 1. □ 

Lemma 6.5. If Assumption 2.7 is valid, then there exists a sequence of 
random, vectors = {uj,^'*\ . . . , f/^'^"'*-')^ with E(U^,) = and the same co- 
variance structure as JJk, such that is a dn-dependent sequence, 

maxi<fc<n|C/;['''*^| = C'(fe^), 1 < /i < dn, and 



P n-'/^ 



k=l 



k=l 



where Vn = -v/n(logn) ^'^ for arbitrary xs > 0. 



Proof. Put 
(6.2) 
and let 



efc,6n =efcl|£j,|<6„ -E(efcl|£^j<b^) 



(h) 



I. J-r 



k,hn k -^max|,|<. 



^!^\l\<n\'^l\<bn I 



I dn OO 



^i=i i=0 



Denote with Ufc,,„ = (C/ii , . . . , U'^^lf ^ then 



0\T. 



Yyk-Y.'^KK 



k=l 



k=l 



> Vr, 



(h) 



<P(max|eH >&n) + P(|^A^E(U^ij| > (logn)-^^) 



Since E(U^'*^) = 0, an application of the Cauchy-Schwarz inequality yields 



|V^E(uiX)l<^/^l|U 



W II 111 



>bnll2 = /nP(max|e/| > 



\|«|<n 



which by Assumption 2.7 is of the magnitude 0{n '^), for arbitrary ly >0. 
Hence we conclude 



(6.3) 



Put U 



(dn) 

k,b„ 



p 



^Vk-^lJk,h 

k=l k=l 

{d„,dn)\T 



o(n-n. 



k,b„ ' • • • ' ^kA 



y . Then 



^k,b! = ^kK ^rh*]^Oii^k-j-i,b 
\j=l 1=0 
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By Lemma 6.4 (remains valid) we have that 



P 



fc=i 

h=Q \ 



> Vr, 



n 



-1/2 Tj{h) _ Tr{h,d 



T{h,dn) 



k=l 



for arbitrary u > 0. Let 1 < h < d, be an array of mutuahy 

(h 

independent random variables, where is an independent copy of £k,bn 

for each h. Then we can define the random vectors 



{h,dri,*) 



dn 



dn 



i=d„+l 



Note that due to the structure of U, 



i=0 

{h,dn,*) 

k,b„ 



it is clear that one may repeat 



all the previous arguments to derive the bound 



(6.4) 



n 



-1/2 



{h,dn,*) 

k,b„ 



k=l 



k=l 



O pin- 



Let (7* = Var(efc b^). Since cr* > for large enough n, the Cauchy-Schwarz 
inequality and Assumption 2.1 imply 



<C\\el\\2^P{\ek\>bn) = 0{n-^). 
Then we obtain from the above and Lemma 6.4 (remains valid) 



(6.5) 



(i-.v<)Ef^a 



{h.dn,*) 



k=l 



Put VI 



^^'*\ . . . ,ujf'"'*^)'^ . Then it is clear that m.aK.i<k<n\uj!^'*^\d 



0{bi), 1 <h<dn, and piecing everything together, the claim follows. □ 

We will now construct an approximation for the random vector U^. 
To this end, we first divide the set of integers {1,2,...} into consecutive 

blocks Hi,Ji, H2,J2, The blocks are defined by recursion. Fix 5* > 

5 > 0, and put m„ = [n*^*]. If the largest element of Jj_i is ki-i, then 
Hi = {/ci_i + 1, . . . ,A;i_i + nin} and Jj = {ki-i + m„ + 1,. . . ,(i„}. Let | • | 
denote the cardinality of a set. It follows from the definition of Hi, Ji that 
= TTin and \Ji\ = dn- Note that the total number of blocks is approxi- 
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mately n/rrir, 



n 



Let X C {0, 1, . . . , dn} be a subset with \X\ = t), with 



i) = 0{n^), A > 0, and denote with a'^Tj the sub-covariance matrix of 
restricted to the subset X. 



Lemma 6.6. If Assumption 2.7 is valid and 5\ + 25* < 1, then on a pos- 
sible larger probability space there exists a D- dimensional Gaussian random 
vector Z with covariance matrix nTzx, such that 



P I max 

hex 



k=l 



> Vr, 



Oiexpi-n")), e>0, 



where Vn = ^/n(logn) , for arbitrary X3 ^ 0, and maxUF^i — a'^Tx\ 



Proof. For h£X, let 
and define the vectors 



= (.■■■, ■■ y , hex, and r]^ = 
Note that per construction, we have that {^fejfcgN is a sequence of indepen- 
dent random vectors with = 0{\/dmnbn)- By Lemma 6.3, we can de- 
fine a sequence of independent normal random vectors = (. . . , Cl^'*^ , . . .)^, 
hex, such that for x > 



(h) 



hex. 



P max 

\ l<h<0 



n/m. 



(h) _JK*). 

■3 



h=l 

h=l 
< Ct)^ exp 



{h) _Ah,*)^ 



> X 



We thus obtain 



(6.6) 



P I max 

l<h<i) 



\ 1 



n/rrin 



{h) Jh,*) 



n/rrin 



0(exp(-n^)), 



and similar arguments show that there exists a sequence of independent 
normal random vectors r/^ = (. . . , r]k^'*\ ■ ■ ■)'^ , such that 



P max 

\ K/iO 



n/nin 



>vn =0(exp(-n^)). 
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Lemma 6.5 yields that Var(7/y 



0{dn) for all j < nin, 1 <h <(). Hence 



by known properties of the tails of a normal c.d.f., we obtain that 



P max 



n/ mn 



>Vn]<^Pi 



h=l 



n/ mn 



> Vr, 



(6.7) 



< dPi\Z\ > Cy/dJ^{\ogny^^) 
= 0(exp(-n")) 

for some e > 0. This yields 

n/m,i 

W , Ah) Ah,*) 



(6.8) P\ max 

\ l<h<I) 



E + - ^j) >^n]= 0(exp(-n^)). 
j=i J 

Let r]l* = (..., ..y h el he a. copy of r/^. such that rj** and ^* are 

independent for j. By the very construction of ^^^Vki it is not hard to 
show that 



max 



(n/rrin n/rrin N 

k=l k=l / 



Gov 



(n/m.„ n/m„ \ 

Etf'*^+^'"^E^i'"^+^i'"M 
A:=l k=l J 



= 0{n/mn), 

which clearly implies max||r^^x — (T^rx|| = 0(m~^). Hence, by enlarging the 
probability space if necessary and arguing similarly as in (6.7), we have that 

,Ah) . (h) Jh.*) 



P max 

\ K/iO 



i=i 



>Vn =0{exp{-n')). 



Finally, we obtain from the above 

/ n n/rrin 

Eui.- E(^.*-^, 



P max 

\ h€l 



k=l j=l 

which completes the proof. □ 



0(exp(-n^)), 



Proof of Theorem 6.1. By Lemma 6.5 it suffices to establish the 
claim for {U^}i<fc<„. This, however, is provided by Lemma 6.6. □ 
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